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Abstract 

Estimation of the level set of a function (i.e., regions where the function exceeds some value) is an im- 
portant problem with applications in digital elevation mapping, medical imaging, astronomy, etc. In many 
applications, the function of interest is not observed directly. Rather, it is acquired through (linear) pro- 
jection measurements, such as tomographic projections, interferometric measurements, coded-aperture 
measurements, and random projections associated with compressed sensing. This paper describes a new 
methodology for rapid and accurate estimation of the level set from such projection measurements. The 
key defining characteristic of the proposed method, called the projective level set estimator, is its ability 
to estimate the level set from projection measurements without an intermediate reconstruction step. This 
leads to significantly faster computation relative to heuristic "plug-in" methods that first estimate the 
function, typically with an iterative algorithm, and then threshold the result. The paper also includes a 
rigorous theoretical analysis of the proposed method, which utilizes McDiarmid's inequality and charac- 
terizes the estimator's performance in terms of geometry of the measurement operator and ^i-norm of 
the discretized function. 

1 Introduction 

Level set estimation is the process of using indirect observations of a function / defined on the unit hypercube 
[0, l] d to estimate the region(s) where / exceeds some critical value 7; i.e., S* = {x e [0, l] d : f(x) > 7}. 
Accurate and efficient level set estimation plays a crucial role in a variety of scientific and engineering tasks, 
including the localization of "hot spots" signifying tumors in medical imaging |241 I16j , significant photon 
sources in astronomy [5D] , and strong reflectors in remote sensing [27] . 

In this paper, we consider making observations of the form y = Af + n, where / is a discretized 
version of /, A is a (discrete) linear operator that may not be invertible, and n is additive noise that 
corrupts our observations. For instance, y might correspond to tomographic projections in tomography 
[19j . interferometric measurements in radar interferometry [33] . multiple blurred, low-resolution, dithered 
snapshots in astronomy [3T], or random projections in compressed sensing systems Q3 [7J [TTJ [35]. Our 
goal in this y = Af + n setting is to perform level set estimation without an intermediate step involving 
time-consuming reconstruction of /. There are two reasons for this. First, level set estimation without 
reconstruction of / would allow design of sequential measurement schemes optimally adapted to the function 
of interest. For instance, in tomography we would like to estimate the level set, S* , quickly from an initial set 
of observations so that additional observations focused on S* can be collected immediately, resulting in an 
overall low radiation dose [22, 26, 25] . Some recent works [T8l [17] have provided theoretical characterizations 
of the significant benefits associated with certain sequential measurement schemes; the method proposed 
in this paper may facilitate the use of such schemes in time-sensitive or computational-resource limited 
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applications. Second, "plug-in" approaches that estimate / and threshold the estimate / to extract S* are 
notoriously difficult to characterize; the performance hinges upon the statistics of the estimation error / — /, 
which for most reconstruction methods are unknown (with the possible exception of the first moment). More 
generally, reconstruction methods aim to minimize the total error, integrated or averaged spatially over the 
entire function. This does little to control the error at specific locations of interest, such as in the vicinity 
of the level set boundary. Finally, the Vapnik Principle [IS] states that one should never solve a complex 
problem as an intermediate step towards solving a simple problem. 

1.1 Problem formulation 

In this work, we observe samples of a function / supported on [0, l] d of the form 

y = Af + neR K (1) 

where A £ M. KxN is a linear operator that is assumed to be known with K often less than N, n is bounded, 
independent and zero-mean nois^J and / G R N corresponds to integration samples of /; i.e., 



/. = , f ( f{x)dx (2) 



vol (Ci 



for i = 0, 1, . . . , N — 1, where the cells CVs are obtained by partitioning [0, l) d into nonoverlapping hypercubes 
such that each Ci has sidelength N~ x / d and volume 1/N. In this work, we consider N to be dyadic (powers 
of two). A 7-level set in this discrete setting can be written as S N — {i : ft > 7} where the subscript N 
signifies that the level set is a function of the TV-dimensional discrete signal Throughout this paper the 
dependencies of the continuous-domain level set S* and the discrete-domain level set S N on 7 are implicit. 
Our main goal is to estimate S* from y without reconstructing the underlying signal /. In the discussion that 
follows, we propose a level set estimation method to estimate S N directly from y and show that S N — > S* 
as N — > 00 in Sec. [3] Similar to [45] , the error metric used to measure the closeness between S N and a 
candidate estimate Sn is defined as 

e N (s N ,s* N ) = ± \-r~fi\ ( 3 ) 

where A(S N , Sn) = (S% \ Sn) U (Sn \ S^)} denotes the symmetric set difference between Sn and S^. 
Note that ^ can be interpreted as an empirical, weighted probability of error under the counting measure 
where the weights depend on the amplitude of the signal relative to the level set threshold 7. This error 
metric penalizes the errors along the level set boundaries with sharper intensity variations more than the 
errors in regions where the intensity around the level set boundary does not vary much. 

Instead of working directly with the error metric, we make use of the risk of a candidate set Sn, defined 

as 

R N (S N ) = ^Yl e ^ S ") W 

i 

where 

= (7 -/i)[%es*} (5) 

is the loss function and !{£} = 1 if event E is true and otherwise. The loss function in |5j) measures the 
distance between the signal value at location i, fa, and the threshold, 7, and weights this distance by —1 or 
1 according to whether i € Sn or not. The loss function li(Sjsf) is positive if i £ A(S N , Sn) and is negative 
otherwise. To see this, observe that for all i € S N \ Sn, (7 — fi) < an d [l{i S s N } — !{i£Siv}] = — 1- A 



lr This assumption is reasonable, since, in practice, the noise is always bounded due to hardware limitations in physical 
sensors. In cases where Gaussian noise models are more appropriate, it is straightforward to see that, given some small S > 0, 
we can always find a bound eg so that the noise magnitude exceeds eg with probability less than 5. 

2 In this work, we adopt the terminology of "function" for the continuous-domain / and "signal" for its discrete counterpart 

/■ 
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similar explanation holds for all i <E Sn \ S N as well. Note that the risk is related to the error metric defined 
in Q by virtue of the fact that 



R N (S N ) - R N (S N ) = ^5>- f<) ([: 



2 



]T | 7 -/ i | = 2 £j v(S J v,^). 



(6) 
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*eA(s^,Sjv) 



Finding an estimator that minimizes the excess risk error Sn(Sn, S N ) is thus equivalent to finding an 
estimator that minimizes Rn(Sn) since Rn(S n ) is simply a constant with respect to Sn- 

This paper presents an optimization problem for choosing an estimate of S N from the data y and 
theoretical characterization of Sn(Sni S n ) when / consists of samples of a piecewise smooth function. 

2 Our contribution and relation with previous work 

In this work, we demonstrate that, subject to certain conditions on A and the l\ norm of /, the level 
set S* can be estimated quickly and accurately via S N without first reconstructing /. For A = I, [35] 
provides minimax optimal, tree-based level set estimation techniques to extract S* from noisy observations 
y = f + n e R N without estimating /. We cannot directly apply the strategy in [IS] to our problem since 
A I. Instead, we draw on the key idea of constructing proxy observations 



from the literature on support detection of sparse signals (see, e.g., [HI US] [14]) and then exploit some of the 
important insights from [45] to address our problem. A part of this work was previously published in [21]. 
This work, however, significantly expands on the previous work and presents new and tighter theoretical 
bounds and extensive simulation experiments. 

Before we present our estimation method, we discuss prior work on level set estimation and sparse support 
detection. 

2.1 Previous work on level set estimation 

Large volumes of research have been dedicated to the problem of estimating level sets of an unknown density 
or a regression function / from its noisy measurements by either using plug-in estimators that find level sets 
of estimates of / [301 113 1321 133 HE] or direct methods that do not involve an intermediate reconstruction 
step [JH B6 , 45 , 34] [35] . Plug-in methods are easy to implement and in some cases lead to theoretical results 
on consistency and convergence based on some smoothness assumptions on the function of interest. For 
instance, [30l QUI [32] propose plug-in methods based on kernel estimators and show that they exhibit fast 
rates of convergence. Mason and Polonik |28j derive the asymptotic normality of the symmetric difference 
between a true level set and an estimate derived using a kernel density based plug-in estimator. Singh, 
Scott and Nowak [37] propose a plug-in method based on a regular histogram partition that minimizes the 
Hausdorff distance between the true and the estimated level sets. They also demonstrate that the proposed 
method adapts to unknown regularity parameters and achieves near minimax optimality on a wide variety 
of density function classes. Though such methods seem attractive, they solve a much harder problem as an 
intermediate step to solving a set estimation problem, which is simpler than function estimation. Vapnik's 
principle stated earlier, together with the minimax convergence results shown in the context of classification 
problems in |46j tell us that plug-in methods are often suboptimal to direct estimation methods. As a result, 
in our work, we focus on direct set estimation strategies. 

Several researchers have considered direct set estimation methods for the case A = I. In [41], Tsybakov 
proposes a direct density level set estimation method that finds piecewise polynomial estimators of the true 
level set and achieves optimal minimax rates of convergence. The estimation method in |41j is hard to 
compute and cannot be directly extended to our problem where A =/= I. In |34j . the authors show the 




(7) 
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theoretical and practical advantages of reducing a regression level set estimation problem to a cost-sensitive 
classification problem. Previous work by one of the coauthors |45j draws on the relationship between a 
classification framework and level set estimation framework, and proposes a set estimation method based 
on dyadic decision trees by exploiting some of the ideas from |36| . A closely related work to density level 
set estimation is the estimation of minimum volume sets such that their masses are at least greater than 
some specified 7 [35]. In that work, the authors discuss tree-based techniques along the lines of [Ml E5] 
and provide universal consistency results and rates of convergence. Inspired by the near optimality of the 
tree-based methods in performing level set estimation, we build on some of the key ideas of [36l [45j [34] to 
attack the level set estimation problem for the case of y = Af + n. 

We briefly review the basic idea in [IS] on which our set estimation strategy is built upon. In [35], the 
goal was to design an estimator of the form 

S = argmini?jv(5) + pen(S), 

S£S M 

where Sm is a class of candidate estimates, Rn is an empirical measure of the estimator risk based on 
N noisy observations of the signal /, and pen(-) is a regularization term which penalizes improbable level 
sets. That work described choices for Rjy, pen(-), and Sm that made S rapidly computable and minimax 
optimal for a large class of level set problems. Specifically, [35] derived a regularizer pen(-) using Hoeffding's 
inequality and developed a dyadic tree-based framework to obtain S. Trees were utilized for a couple of 
reasons. First, they both restricted and structured the space of potential estimators in a way that allowed 
the global optimum to be both rapidly computable and very close to the best possible (not necessarily 
tree-based) estimator. Second, they allowed the estimator selection criterion to be spatially adaptive, which 
was critical for the formation of provably optimal estimators. Note that while we intend to build upon the 
insights developed in [45) , an extension of those techniques to the case of proxy observations in ^ is made 
nontrivial because of two reasons. First, the effective noise n' is nonzero mean because of the presence of 
(A T A — /) /. Second, and most importantly, n' is correlated due to the non-unitary nature of A, which 
prohibits the use of Hoeffding's inequality for characterization of the penalty term. 



2.2 Relationship with previous work on sparse support detection 

Sparse support detection is the problem of detecting a set of locations S*^ = {« : /; ^ 0} corresponding to a 
discrete signal / e R N , given observations of the form in This is a special case of level set estimation 
and the two are equivalent if / is nonnegative and 7 = 0. The idea of constructing proxy observations 
z to deduce certain properties of the underlying / has been successfully employed in recent compressed 
sensing and statistics literature to solve the problem of support detection of a discrete / having no more 
than to non-zero entries; see, e.g., [3] [13l [T4] [T7] . Specifically, it is established in [3] that the support of 
an m-sparse / can be reliably and quickly detected from appropriately thrcsholded proxy observations with 
overwhelming probability as long as A satisfies a certain, easily verifiable coherence property. The success of 
this thresholding method stems primarily from the sparsity assumption on /. However, when / is not sparse, 
as is the case in level set estimation, simply thresholding the proxy observations will result in numerous false 
positives and misses as discussed in detail in the numerical experiments in Sec. [7j see Figs. 2(a)[ |2(b)[ 



and Figs. 3(a) through 3(c) These results clearly suggest that we cannot simply use a support detection 
algorithm and an optimally chosen threshold to achieve an accurate level set estimation. In contrast, our 
methodology relies on a novel two-step approach that enables us to work with proxy observations without 
requiring / to be sparse. 



3 Fast level set estimation from projection measurements 

In order to extract the 7-level set of / from y, we propose a novel two-step procedure. First, we construct a 
proxy of / according to ^ , which allows us to arrive at the canonical signal plus noise observation model. 
Next, we perform level set estimation on the proxy observations z, rather than on y, using a method similar 
to the one derived in |45j . We refer to the resulting estimator as the projective level set estimator. Note that 
for any unitary A, z in ([7| reduces to y = f + n with ft having independent, zero-mean entries. However, 
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Figure 1: An example level set estimate S £ Sm where the domain of the underlying signal is [0, l] 2 . Shaded 
regions are estimated to be outside the level set. 



for non-unitary A, the proxy defined in (|7j) creates a signal-dependent interference term (A T A — J) / and 
a zero-mean correlated noise term A T n. The "proxy error," denoted by n' — (A T A — i) f + A T n 7 makes 
a direct extension of the level set approach discussed in |45j to our problem nontrivial. This is because the 
choice of the penalization term in [45] is a nontrivial function of the proxy error term. 

Intuitively, if we try to make a decision about each Zi independently, then we would be vulnerable to noise 
(see, e.g., Figs. |3(b")j and 3(c) ). On the other hand, if we consider patches of z^s and force each patch to be 



wholly inside or outside the level set estimate, then we increase our robustness to noise but also increase our 
bias. Ideally, we want spatially adaptive patches that allow us to balance between an accurate approximation 
of the true level set boundary and estimator variance. It is in this vein that we theoretically analyze the 
impact of n' on the level set estimation problem and use our analysis to develop a spatially-adaptive, dyadic, 
tree-based level set estimation approach that adapts to both the interference and the correlated noise term. 

The algorithm we propose basically works by using z to find a partition of / into a collection of disjoint 
sets of "pixels." For each set, we determine whether it is inside or outside the level set with a simple 
voting procedure — i.e., we determine whether the majority of the Zj's in the set are greater than gamma. 
Thus searching for the optimal level set estimate amounts to searching for a good partition of / and then 
performing empirical risk minimization, defined in ([£]) in the sequel, on that partition. We restrict our 
attention to partitions defined using binary trees because they yield tractable algorithms and, in the case 
where A = I, minimax optimality |45j . 

Specifically, let Sm be a collection of candidate level set estimates for a dyadic M (i.e., M — 2 q for 
some positive integer q), where each S £ Sm — following the analysis in |45j — is obtained by recursively 
partitioning the domain of / in dyadic intervals. The number of dyadic intervals along different coordinate 
directions is not required to be the same. In other words, each cell in the partition can potentially have 
different sidelengths and the sidelength of the smallest cell is 1/M. An estimate S £ Sm is obtained by 
assigning each cell in the partition to be inside or outside of the level set. Fig. [I] shows one such estimate 
in two dimensions where the shaded regions are the partition cells that are estimated to be outside the level 
set. Though we do not specify M in terms of N here, we derive an upper bound on M as a function of N 
that achieves a certain expected excess risk in Theorem [2] 

Given z, our goal is to find a level set estimate 

5jv = argmini?jy(5) — Rn(S^)— axgmm Rjy(S) (8) 

S£Sm S£Sm 

where Rn(-) is defined in Q and the second equality follows since Rn(S^) is a constant. (Note that 
Sn = £jv if &n &M-) Since / is unknown, Rpj(S) cannot be computed; instead, let us consider an 
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empirical risk of the form 



1 N 

^(5) = -^( 7 -^)[l { 



(9) 



We show that finding an estimate that minimizes a penalized empirical risk results in an estimate that 
asymptotically approaches Sn- Specifically, we find 



Sn = argmin R N (S) + pen Ar (S), 

S€S M 



N — >-oo 



(10) 



subject 



where j>ea N (S) is an interference-dependent penalty term that yields Rn(Sn) — Rn(Sn) 
to certain conditions on A, which generally require K — > oo as N — > oo. The penalty term plays a major 
role in our estimation strategy and is crucial in finding estimates that hone in on the boundary of the level 
set S^f. We thus focus on designing a spatially adaptive penalty pen N (S) that promotes well-localized level 
sets with potentially non-smooth boundaries. Let ir(S) be the partition induced by an estimate S € Sm, 
i.e., n(S) is the collection of all leaves in the estimate S € Sm- Fig'- [l] shows a partition induced by one of 
the estimates S € Sm where every white or gray shaded block is a leaf. Then the risk of S in each of its leaf 
L € 7r(5) is given by 

1 N 



H£(L)=0}\ HiEL} 



where the label £(L) = 1 if L is in the level set and otherwise. Note that Rj^(S) — J^Le^iS) Rn{L). 
We design a spatially adaptive penalty term by analyzing Rn(L) — R^(L) within each leaf separately. To 
facilitate our analysis, let us define 

1 N 

Rn{L) = — (7 - E N) [!{*(£)=!} - %(£)=0}] 



Then 



iZjv(i) - ^iv(i) = LRiv(i) - Rn(L) + R N (L) - R N (L) 



< 



i ^ [(E [ Zi ] - fi) + (Zi - E fe])] [I { ,(£) =1} - l W L)=0}] h 

1=1 

1 N 

- 2J (E [Zi] - fi) [!{*(£)=!} - I{4(£)=0}] I{i£L} 



(11) 



1 " 



{<(£)=!} -^(£)=0}J Jl{iGL} 



Note that while T\ is a measure of the bias in z, T 2 is a measure of the concentration of z about its mean. 
Assuming without loss of generality that the columns of A have unit £2 norms, one can easily see from ^ 
that 



N 

f i+ £ + n) 



J =1,3'/* 



where A^ denotes the i th column of A and (•, •) denotes the usual innerproduct. Since A is given, and n 
is zero mean, the term 



JV 



E [m] - fi 



G 



in Ti is the signal-dependent interference term at the i th location due to the signal energies at other locations. 
We upper bound T\ by the l\ norm of / and the worst-case coherence of A (defined in the statement of 
Theorem [I]), and bound T 2 using McDiarmid's inequality [53] and sum the risk in each leaf of the estimate 
S to arrive at the following result. 

Theorem 1 (Concentration of risk around the empirical risk) Suppose that the entries of noise n 
are bounded between [c£,c u ]. Then, for S € [0, 1/2], with probability at least 1 — 26, the following holds for 
all S £ Sm ■' 



Rff(S) — Rn{S) 



where Wf^ = J2i \fi\ is the h norm of f, 



(12) 



pen N {S) = 

Lett(S) 



[log(2/<5) + [L]log2] \c u -Q| 2 E« ei 



(13) 



is the penalty term, 



u(A) = max 

i,je{l,...,N},ijtj 



A®, A® 



is the worst-case coherence of A and [L] is the number of bits in a prefix code used to uniquely encode the 
position of a leaf L in the tree. 



The proof of this theorem is provided in Section |5.1[ The above bound holds for any prefix code \L\ . 
In order to achieve the error rates in Theorem [2] we use a certain prefix code, which is discussed before 
the statement of Theorem [2] Note that the bounds in ( 12 ) and ( 13 ) depend on (a) the signal-dependent 
interference term in ^ through \\fWi, (b) the variability of noise through \c u — C(\, (c) the choice of A 
through fJ,(A), (d) the depth of each leaf through [L], (e) the size of each leaf through j^L > A^~) 
and (f) the parameter 8. Ideally we would like to minimize Rn{S) to obtain Sn in Since Rn(S) is 
bounded by R]y(S)+pen N (S), minimizing the bound ( 12 ) will ensure that our estimate Sjv in ( 10 ) is as close 
to Sn in (J8JI as possible. In order to minimize the risk difference in ( | 1 2[ ) . one needs to choose an estimate 
S G Sm that has the least pen N (S) in (13). The penalty term in (13) is directly proportional to the number 



of leaves in the partition n(S) and the size of each leaf through the term ^ i j eL (A^ , A^^. As a result, 
searching for an estimate 5* € Sm that minimizes ( 10 1 will favor estimates with few, deep leaves that hone 
in on the boundary of the level set. 

The theoretical analysis of our method is significantly different from the analysis in -4S] because of the 
statistics of the noise term n' in our problem. However, this only changes the way the penalty is defined 
in our setup. As a result, we can adapt the computational techniques discussed in [45] to compute our 
estimator in an efficient way. Our method is computationally efficient since the proxy computation needs at 
most O(KN) operations (fewer if A is structured; e.g., A is a Toeplitz matrix) and the level set estimation 
method needs 0(N log N) operations, as noted in [45] . 

In addition to the concentration of risk around the empirical risk, we can also utilize the results of 

Theorem jlj to upper bound the expected excess risk E R(Sn) — R(S*) 
distribution, in terms of the problem parameters, where 



taken with respect to the noise 



R(S) 



[0,1] 



(7 - /(*)) [hxes] ~ hx?s}] dx 



is the definition of risk in continuous-domain. The expected excess risk is a measure of the effectiveness of our 
level set estimator. Before studying it, we make certain assumptions about the smoothness of / in the vicinity 
of the level set boundary. Let OS* represent the level set boundary corresponding to S* — {x : f(x) > 7}. 
We assume that / is in a box-counting function class T>box(k, 7, cq, ci, C2) for Co, ci, C2 > and 1 < k < 00 
[4"5] such that the following hold: 
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(a) If we partition [0, l] d to m d equisized cells for m < M, with each of them having a sidelength of 1/m 
and volume m~ d , then the number of such cells intersected by the level set boundary N$* (to) < c\m d ^ 1 . 
This ensures that dS* varies smoothly and is not an irregular, space-filling curve. 

(b) pl = J xGL dx < coX(L) for any measurable L g [0, l] d where A is a Lebesgue measure. 

(c) For all dyadic m, there exists some Sj n S m thcit minimizes the symmetric difference between any 
S € S m and the true level set S* in terms of the Lebesgue measure A, i.e., 

S* m = axgminX(A(S,S*)). (14) 

ses m 

For this S^, the excess risk in the continuous domain follows 

e(S* m ,S*) = R(S* m )-R(S*)= [ \l-f{x)\dx<c 2 m-\ (15) 

Ja(s;s*j 



Parameter k and the assumption on the excess risk in (151 allow us to study the fluctuations of / around 
dS* and thus examine the behavior of £(5^, iS*) in the vicinity of the level set boundary. If / exhibits a 
very small fluctuation around dS* , then £(5^, S*) is small even if the symmetric difference A(5* S^) is very 
large, since the excess risk is weighted by how close is / to 7. In other words, a high value of k indicates 
that / varies very smoothly around the level set boundary and a low value of k means that there is a jump 
in / around dS* . 

Recall that S*^ is obtained by partitioning the space [0, l] d to TV equisized cells of sidelengths N~ d and 



assigning each cell to be inside or outside of the level set. From (15 1, for m — N 1 ^, 

R(S* N )~R{S*) < c 2 N- R ' d . 

Thus S* N — > S* as N — > 00 and estimation of S* via is reasonable. 

To achieve the results of Theorem [2j we adapt the prefix code proposed in [351 ISS]- According to [151155] , 
a leaf L of a level set at depth j of the tree can be uniquely encoded using a total of j(log 2 d + 2) + 1 bits. 
Specifically, one needs j + 1 bits to encode the depth of the leaf, j bits to encode whether each of its ancestors 
corresponded to a left or a right branch of the tree, and j log 2 d bits to encode the orientation of each of the 
j branches. 

Before we state our main theorem, let us clarify the notation used in the following. For a given set of 
sequences a n and b ni a n =4 b n implies that there exists a constant C > such that a n < Cb n for all n and 
Oin x b n implies that there exists constants C\ and C2 such that C\a n < b n < Cia n for all n. 

Theorem 2 (Upper bound on the expected excess risk) If f € 2?box(K) 7> COi Ci, C2) is discretized 
according to — B < f(x) < B for x £ [0, l] d , —B < 7 < B, and the estimate Sn is chosen ac- 
cording to with pen JV (S , 7v) defined according to Theorem. [7J then, for a given A, d > 2, and for 



M 



(1 



JV 



l/d 



E 



111^ ) - W ' ^ ( l|A|l ^° giV ) " 2 +M(A)||/|| 1 (16) 



where the expectation is with respect to the noise distribution, \\A\\ 2 is the spectral norm of A, \\A\\2 = 
\J Amax (^4 T ^.) 7 an d n(A) is the worst-case coherence of A. 



The proof of this theorem is given in Section [5. 2| This theorem tells us how the expected excess risk scales 
with the dimensionality N of the underlying signal /, the l\ norm of /, the choice of A and the smoothness 
of the underlying function around the level set boundary through the parameter n. For a unitary matrix A, 

\\A\\ 2 = 1 since its eigenvalues are all equal to 1, fi(A) = and E R(Sn) ~ R{S*) 4 [^fT^j 2K+d ~ 2 , which 
is the minimax optimal rate derived in [45] without the projection matrix A. Since in practice A is dictated 
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by the physics of the measurement system, it is not always unitary. In such cases, the above theorem tells 
us how any given A increases this bound. For some A, such as the one discussed in the following section, 



fi(A) — > as N and K go to oo since A T A N ' I Sjz^°° j N te that can be specified in terms of 



the continuous-domain function / by noting that \\f\\ l < -A^l/H^ 
function / is not completely positive (or negative). 



although it is a loose bound when the 



Corollary 1 (Performance with random projections) If the entries of A E R KxN are drawn from 
Af (0, 1/K), and the columns of A are normalized to have unit 1% norm, then 



E 



R(S N ) - R{S* 



=4. 



lOffiVA ^ + d-2 



N 



K 



N 



-y/T2K\ogN 
Vl51og7V 



VK - y/UTogN 

holds with probability at least 1 - ^-^k+n) + N -2 + n^-ij as \ ong as 60 log TV < X < . 



(17) 



The proof of this corollary is provided in Sec. |5.3| The result above yields an upper bound on the expected 
excess risk as a function of the dimensions of the projection operator A and \\f\\ v In words, this corollary 
states that the expected excess risk in the case of random Gaussian projections is minimized if the number 
of measurements K scales linearly with N and increases if K scales sublinearly with N . Dependence of the 
estimator's performance on the i\ norm of / is due to the interference term (A T A — JJ / that arises during 
the proxy construction. The foregoing results provide key insights into ways by which we can minimize the 
expected excess risk and improve performance, as discussed in detail in the following section. 



4 Performance improvement via projected median subtraction 



So far we have shown that the signal-dependent interference term in ^ leads to a penalty term proportional 
to ll/H 1 in (12). This implies that the interference in z and thus the performance of our method may worsen 
with the increase in H/H-p which is indeed confirmed by the experimental results in Sec.[7j To find a way to 
minimize the signal-dependent interference, let us write / = / + Al, where A is a constant DC offset such 
that 



/ ^ll/llr 



(18) 



If we have access to an estimate A of A, then we can minimize the signal-dependent interference by subtracting 
a projection of this constant offset to obtain 



y = y AX1 = A(f + Al) + n - AX1 



= A( f + (A - A) 1 j + n « Af + n. 



assuming that A w A. The proxy observations in this case reduce to 

z = A T y w A T Af + A T An = / + (A T A - l)f + A T An. 



Since S* N = {i 



fi > 7} — : fi > 7}j where 7 = 7 — A, we can estimate from z using our level set 
estimation method discussed in the previous section. 



If we let A to be the median of /, then we can easily show that ( 18 1 holds for this particular choice of A 



Note that if A is the median of /, then half the pixel values of / are below the median and half of the pixel 
values are above the median. Let Q — {i : > A} and G c = {i '■ fi < A}. The cardinality of Q is \Q\ = N/2 
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for N everj^] By the definition of median, \Q\ = \G C \- Then 



= H/-ai|| 1 = Y,\fi-M + Y,\fi-M 



ieG 



E 



tee 



tee 



In practice, however, estimation of the median of / from y might be hard, though the estimation of the 
mean of / might be tractable. For instance, if we construct A' = [^] (i.e., the first row of A' is 1 T ), then 



y 



A'f + n 



, and A = y[/N = Q2i fi + n\)/N = A + m/N. If the observation noise is negligible, 



or if N is large, then A sa A and we can perform projected mean subtraction, instead of a projected median 



subtraction, to reduce the signal-dependent interference. While ( 18 1 does not always hold if A is the mean of 



/, simulation results in Sec. [7] suggest that projected mean subtraction can result in significant improvement 
in performance. 



5 Proofs of theorems and corollaries 

This section presents the proofs of all the theorems and corollaries stated before. 
5.1 Proof of Theorem [I] (Concentration of risk ) 

Let us begin by bounding T\ and T 2 in (1 1 lh separately. Let p L = ^2 ieL jj be the ratio of the number of 
observations in leaf L to the total numberof observations N. From the statistics of z, we can bound T\ as 
follows: 



T X < ~ ^ ]/,||(A«,A0-))||[l Wi)=1> -I W) =o}]ll { 



< 



ft(A) 



N 



E E i/i 



N 



E Ei£i-i/< 



iet \i=i 



/'(A^ll/ll: -4^E^I' 



iV 



(19) 



where the second inequality is due to the fact that | [l{^(i)=i} ~^{f(L)=o}]| = 1 an d |(-AW,A^ )| < A*(^) 
for all j ^ i. 

We can bound T 2 , which is a function of z, using McDiarmid's inequality by observing that Zi = 
Ylj=i a j,iVj an d t nus T2 can be written as a function of the independent random variables {yi}- Specif- 
ically, 



T-2 



g(.Vu-,VK) %\g{y u ...,y K )] 
3 We do not consider N to be odd since our recursive dyadic partitions require N to be in powers of two. 



K 



Ew(i)-E 
3=1 



A" 

E 



yjWj(L) 
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where 



w 



} - Hi(L)=0}\ 



ieL 



N 



(20) 



We formally state McDiarmid's inequality in the following lemma. 



Lemma 1 (McDiarmid's inequality) Suppose that y\, . . . , yx are independent random variables and that 
a function g(yi, . . . , yx) satisfies 



sup 

yi,...,y K ,y' 



g(yx,...,y K ) -g(y {p 1] ,y' p , y P +i, ■ ■ ■ ,m) 



(21) 



where p = 1, . . . , K, {y{\f = i are the values that the independent random variables {y{\f =1 take and y( p 1 - 1 
{yi, . . .,y p -i}. Then, for any e > 



\g(yi, ■■•,Vk) -E [g{yi, . . -,yK)]\ > e) < 2exp 



-2e 2 



T K c 2 



(22) 



We can thus bound using McDiarmid's inequality as long as g(yi, ■ ■ ■ , ytc) = £j=i yj w j(L) satisfies (21 1. 
Since the entries of noise n are bounded, it can be shown that the function 5(2/1, . . . , yx) satisfies the bounded 
differences property [29] as follows: 



sup 

Vi,---,VK,yi 



i(yi, ■ ■ ■ , m) - g(y {p y P +i ,---,Vk) 



sup 

Vu—,VK,V' B 



K K 

E yjWj (l) - Y yj w i ( L ) ~ y'p w p( L ) 

j =1 j=hj¥v 



= \w P {L)\ _ sup \y p - y' p \ 
yi,---,VK,y' 



= K(i)l_ sup ^ (iW T ,/)+Sp-((i (p)T ,/)+s;) 



ni ,...,njc ,ti 



\w p {L)\ sup \n p -n'\ = \w p (L)\\c u -ce\. 



ni,...,riK,n' 



(23) 



where {ni}f =l are the values that the random variables {n i }fL 1 take. Now, using McDiarmid's inequality 
we arrive at the following result: 



-2e 2 



P(T 2 >e)<2exp 

\L P =i \wp{L)\ \c u ~a\ 

Note that we can expand J2 P =i \ w p(L)\ 2 m (24 1 from |2o| as 



(24) 



f> p (L)| 2 =£ 

p—1 p—1 

K 

= £|[: 
p=i 

A' 



[l{«(L)=i} ~ %(i)=o}] £ "jy 



i6L 
,2 



E 

p=i 



2 



N 



E a p, 
~N 



ieL 



e(e^) e^ 

p=l ViSi / \jeL 



E E ^ f E ^) - ^ E E ^ 

ieLjeL \p=l / ieL jeL 



(25) 
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By substituting (25) in (24) and by equating the right hand side of (24) to 6l € (0, 1/2) and solving for e, 
we can show that, with probability at least 1 — 25 l, 



/ log(l/<5 L )| Ctl -Q| 2 E» jei <AW,AO)) 



27V 2 



(26) 



Applying the bounds in ( 19 1 and ( 26 ) to (11) we can see that with probability at least 1 — 25 l , the following 
holds: 



R N (L)-R N (L) < n(A)p L \\f\\! - 



N 



5> 



ieL 



/log(l/ ( 5 L )| Ctl - Q | 2 E ljei (AW,AO)) 



2N 2 



Thus for a given S € Sm, the risk difference 



Rn(S) — Rn{S) 



is upper bounded by summing the bound 



corresponding to each leaf separately. Since ^2l^tt(S)P l = 1 ana - Y^Lew(S) EieL = WfWi we have 

'N - V 



R N (S)-R N (S) <fi(A) 



N 



1/11!+ E 



l\og(l/5 L )\c u -c e \'j: h3e L(^,AU)) 



2N 2 



with high probability. If we let 5^ = 52~^ L ^ +1 * > where [i] is the number of bits required to uniquely encode 
the position of leaf L, then it is straightforward to follow the proof of Lemma 2 in [JS] to show that the 
bound above holds for every S £ Sm , which leads to the result of Theorem [T] 

5.2 Proof of Theorem [2] (Performance analysis) 

In order to analyze the performance of our estimator, we will draw upon the proof techniques and the 
associated performance analyses in previous works on classification and level set estimation [36| 145] . Note 
that some of the steps in our analysis that are adapted from [3S1 0S] are repeated here for readability. 

The proof of this theorem follows by relating the continuous-domain risk of a level set S £ Sm to its 
discrete counterpart and exploiting the results from Theorem [l] By expanding R(S) for any S € Sm in 
terms of the discretization of / in ([2]), we have, 



R(S)= (i-/W)[i |ies} -i ws} J^ 

J x 

N 

= E / (7-/(z))[l{c ieS }-%7^ 

i=l Jc i 
N 

= (Tvol (Q) - vol (Q) fi) [l {CieS} - I {Ci ?s } ] 



dx 



i=l 
N 

E 



7__ fi 
N N 



where the second equality holds since Cj is contained either in S or in the complement of S. Since Sn £ Sm, 
R(Sn) = Rn{Sn)- Let us consider some S' N € Sm that minimizes the penalized excess risk between any 
S e Sm and the true level set S*, i.e., 

S' N = min [R(S) - R(S*) + 2pcn JV (S')]. 

SeSm 
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From the definitions of Sn in ( 10 ) and S' N , and the results of Theorem[lj the following hold with probability 
at least 1 - 26 for 6 e [0, 1/2]: 



IlyS_x) — R[S ) — Rn(Sn) — R(S*) 

< R N (S N ) + pen N (S N ) - R{S*) 

< R N {S' N ) + pen N (S' N ) - R(S*) 

< R N {S' N ) + pen N (S' N ) + pen(S' N ) - R(S* 
= R(S' N ) + 2pen N (S' N )-R(S*) 

= min [R(S) - R(S*) + 2pen N (S)]. 



(27a) 
(27b) 
(27c) 
(27d) 
(27e) 
(27f) 



Let f2 denote the set of all proxy observations z for which (12 1 from Theorem [I] hold and 5 = 1/N. Since 
-B < f(x) < B for all x € [0, l] d and —B < J < B, 



E 



E 



R(S N )-R(S*) 

= E 

:k\r n {s n )- R(s*)\n 



Rn{Sn) — R(S* 



R N (S N ) - R(S*)\Q ¥{fl)+E R N {S N )- R(S*)\n c P(O c ) 

2 



E 



R N (S N ) - R(s*)\n c 
2 



N 



< mm [R(S) - R(S*) + 2pen N (S)] + AB x ^ 



(28) 



where the first term in (28 1 is due to (27f) and the second term is due to the boundedness assumption on 
f(x) and 7. Specifically, 



Rn{Sn) — R{S*) = R(Sn) — R(S*) 



(7 - /(*)) 
: / ABdx = AB 



l {x€S N } ~ ^{x^Sn} ~ HxES*} + Hx$S*} 



dx 



since 7 — f{x) < 2B and l{ xe s} ~ ^{x^s} < 1- Rewriting (28) we have, 

E[i?(5 ; 7V )-i?(S'*)l < min [R(S) - R{S*) + 2pen N (S)] 



N 



< min min [R(S) - R(S*) + 2pen N (S)} 

X<m<M oGo m 

< min R(S*J - R(S*) + 2pen N (S*J 

Km<M 



N 
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N 



< min m K 

Km<M 



2pen N (S^ 



N 



(29) 
(30) 
(31) 
(32) 



where in (|31j) is defined in ([14]) and (|32jis due to ([15). 

Let us now bound pen N (S^ n ) given in (13). To this end, let us rewrite 

'N- V 



pen N (S* 



N 



M (A)||/|| 1+ pen^(^) 



(33) 



where 



, \^ 1 /[log(2iV) + [L]log2]|c tl -Q| 2 E Jjei (^ ) ,^ ) ) 
pen N (S m )= ^ — y 



and bound peii l N (S* n ). 
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To keep the notation simple, let \L\ = (X^gl -0 be * ne number of pixels in leaf L and Al he & K x \L\ 
matrix formed by collecting the columns of A corresponding to the indices i £ L. Note that \L\ = J^ieL 1 = 
Np L . Let 



q L = pog(2JV) 
Using this notation, we can write 



log 2] (\c u - Cl \ 2 /2)p L . 



(34) 



pen w 



JV ( S *J= E \ 



pog(2JV) + [L] lo g 2]|c M - Q 



1 (|L|x 



(|£|xl) 



it 



E 

^■"■(Sm) 

E 



N 



1 (|L|x 



Np L 



(\L\xl) 



Ae,1(|L|x1) 



< 



E 



E 



Le 
A 



Al1(|L|x1) 



l(|i|xl)| 



< u 



i 2 E 



(35) 
(36) 



where the inequality in (35) follows from the definition of the spectral norm of Al given below: 

A L x 



= max 
2 x^o \\x\\ 2 



> 



Al1(|£|x1) 



H\L\xl)\ 



The term X^lse^s* ) Vl L 7^ m '^61 can now ^ e bounded from above by using the proof techniques in [31)1 145] . 
Previous work [35] showed that for a binary tree with N leaves at its finest level, [L] =<! log AT. From the 
definition of the box-counting function class, pl = f L dx < cq\(L) = cq2^^ l ^ where j(L) is the depth 

corresponding to leaf L of the tree. Note that ph — SigL F = Sigz, fc ^ x = Il^ x = P L - By substituting 
these results in ( 34 1 we have 



E 



=4 



5 \ 



pog(2JV) + logJVlog2] (|c-^| 2 /2)2-i( L ) 



N 



\ 



< 



< 



\ 



[log(2AT) - 


- log N log 


2] (|c 


~c,| 2 /2) 


N 


[log(2iV) - 


- log A" log 


2] (|c 


-c,| 2 /2) 



N 



E 7 ^ 

3=1 



\ogN 



AT 



(37) 



where J = log 2 A" is the deepest level of the binary tree, Tj is the number of leaves at depth j of the tree, 
c is a constant that is a function of the upper and lower bounds, c u and ci, on noise, and (371 follows 



straightforwardly from the proof of Theorem 6 in [36] . By substituting ( 37 ) in ( 36 ) , we have the following 

pen^Hm d / 2 -y 



'log N 



(38) 
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From (32), (33) and (38), 



E 



R(S N )-R(S*) 



4 mm \ m - + m ^-\r^\\A\\ 

Km<M V N 



f ....... , &B 

W(A)||/|| 1 + — 



4 mm 

Km<M 



N 



N 



We can easily show that m 



N 



\A\\ 2 lo S N 



minimizes the expression above. Since 1 < n < oo, 



the bound on m is largest for k = 1. Exploiting this result and the fact that m < M, we have that for 



M )? 



( N V 

\\\A\\l\a S N ) 



E 



R{S N )-R{S*) 





'||A||=logJV\ 







M(A)||/|| r 



(39) 



5.3 Proof of Corollary [T] (Performance with random projections) 

The proof of this corollary is obtained by bounding the spectral norm of A and the worst-case coherence 
of A with high probability. Let A <E M. KxN be a matrix whose entries are i.i.d. draws from Af (0,1/K). 



Each column of A is then simply obtained by normalizing the columns of A, that is, A^ z ) 
i € {1, . . . , N}. The bound on ||A|| 2 is obtained by first showing that 

2 



\AW\ 



\A\\* 2 <q 



for 



(40) 



for some constant q and then bounding 



using the results from random matrix theory. In particular, 



[4"4~] states that the spectral norm of an K x N subgaussian matrix M is upper bounded by ||iW|| 9 < 
c(^/K + VN^J with probability 1 — exp {—c(K + N)). This result can be straightforwardly extended to show 
that 



< c 



i 



(41) 



with probability 1 — exp (—c(K + N)). We show that ( |40[ ) holds with high probability by taking the following 
approach: 



max 1 1 Ax | 

x: || x || 2 — 1 



max 
v-T. ] p]\\a^\ 



max 

£C:||£c||.,= 



max 

a= : 1 1 a= 1 1 .-, = 



X 



AW) 



E 



E a 



max ■ 

p#0 



E> 


E, 


a i,jPj 


2 


E, 


v) 


AW) 


2 
2 



where 



AO) 



and p = [pi P2 ■■■ pjv ] T . Following the proofs of Lemma 1 in [23J and Theorem 8 in [3], 



we can easily show that 



AW) 



> 1 



v/1 ^ sjv with probability at least 1 - iV~ 3 for any j E {1, 

2 



Applying the union bound over every possible j € {1, . . . , N}, 
least 1 — iV~ 2 . Using this result in the above equation we have, 



A« 



> 1 



V121og Af 



I j4.ll 2 < max ■ 
p 



E, 


Ej OijPj 


2 


E^K 1 "™) 



-, _ V121og N 
1 



with probability at 



(42) 
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with probability exceeding 1 — N 2 . By substituting (41 1 in (42 1, and applying the union bound, the following 
holds with probability exceeding 1 — exp (— c{K + N)) — N^ 2 : 



\A\\ 2 <c 



y/N/K + 1 
~ -y/12 log N 

Vk 



/K + VN 
yjK - y/Y2K log N ' 



(43) 



The rest of the proof follows straight from Theorem 8 of [3] which states that 

fi(A) < 



K-y/12\0gN 



(44) 



with probability exceeding 1 — UN 1 as long as 601ogJV < K < ,j I , The bound in (44 1 together with 
the bound in ( 43 1 and the result of Theorem [2] yields the result of Corollary [I] 



6 Relationship with plug-in methods 

The success of wavelet-based methods in estimating a piecewise smooth function from noisy measurements 
suggests a potential extension of such methods to the problem of level set estimation [T^] . For instance, one 
possible approach for level set estimation from projection measurements is to first estimate the underlying 
signal / from proxy measurements z using wavelet-based denoising methods and then threshold the resulting 
estimate at level 7. Estimating / from y through an intermediate proxy construction step is similar to the 
iterative hard thresholding method in compressive sensing literature with just one iteration |6j. While such 
plug-in estimation techniques using wavelet-based methods offer practical solutions to the level set estimation 
problem, their estimation performances are not yet understood. 

The proposed multiscale, partition based set estimation method with proxy measurements can be thought 
of as a combination of an iterative hard thresholding method with just one iteration, and wavelet-based 
denoising ideas. Specifically, our partition-based method is similar in spirit to the wavelet-based denoising 
ideas using the unnormalized Haar wavelet transform. Both wavelet-based methods and our method rely on 
the spatial homogeneity of the underlying signal / to perform level set estimation. The difference between 
the two methods stems from the way in which the wavelet coefficients are thresholded in each case. While 
the threshold in the wavelet-based method is chosen to minimize the mean squared error, our method 
thresholds the coefficients at levels that are tailored to the level set estimation problem. Since the proposed 
method shares similar ideas with wavelet-based methods, the proof techniques presented in this paper could 
potentially be extended to wavelet-based methods in order to characterize their estimation performances. 

Compressive sensing theory presents a variety of algorithms such as iterative hard thresholding [5] , basis 
pursuit [5], orthogonal matching pursuit [30]) LASSO and total- variation based methods [5] to reliably 
estimate / from y. One can readily use such algorithms to first estimate / and then threshold it or use the 
method in [44] to estimate the level set. However, there are a couple of issues in using these plug-in methods 
to perform level set estimation. First, these approaches aim to minimize the mean squared error over the 
entire image. This, however, does not guarantee minimization of errors close to the level set boundaries, 
which is critical to the characterization of level set estimation performance. Second, the iterative nature of 
these algorithms make them computationally intensive and time consuming. 



7 Experimental results 

Due to the lack of a theoretical performance comparison between plug-in methods and our method, we 
present an empirical comparison of these methods in this section by conducting experiments on a test image. 
Simulation results discussed below demonstrate that the proposed partition-based, multiscale method using 
proxy observations has the following advantages: (a) it is a powerful tool to perform direct level set estimation 
from projection measurements, (b) it allows us to exploit the spatial homogeneity of the underlying function 
to perform set estimation, (c) it performs an order of magnitude better than thresholding methods that 
obtain level set estimates by simply thresholding the proxy observations at level 7, and (d) it yields results 
that are comparable to the results obtained using wavelet-based thresholding approaches. 
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(a) True signal / e R 128 * 128 (b) Level set = {i : fi > 

such that ft e [44,239]. We 125} (white pixels) such that 

measure K = 8192 Gaussian \S^\ as 0.4285/V where TV = 

random projections of this im- 128 X 128. 
age. 



Figure 2: Snapshots of the true signal and its desired level set. 



In order to test the effectiveness of our projective level set estimator, we conduct experiments on a test 



image of size 128 x 128, shown i n Fig. 2(a) In these experiments, we are interested in estimating 7-level set 



A" 



of this test image shown in Fig. 2(b) from noisy, projection measurements of the form y = Aj + n G 
for K < N — 128 x 128, without reconstructing / from y. The entries of the projection operator in these 
experiments are drawn from J\f (0, 1/K) and the noise is distributed asn~A/' (0, 1). Note that we consider 
a Gaussian noise model here, which is unbounded. However, it is bounded with high probability and our 
theory can also be extended to this case. We compare the performance of our method with the performances 
of the following approaches using the excess risk error metric defined in Q: 

(a) Thresholding method, where the estimate Sj is simply obtained by thresholding the proxy observations 
z at level 7; that is, S 7 = {i : Zi > 7}. 

(b) Risk-optimal thresholding method, where the estimate is obtained by thresholding z at a level 7 that 



minimizes the excess risk; that is, — {i : zi > 7} where 7 = argmin sn 



(c) Non-iterative wavelet-based plug-in method, where the estimate S w is obtained by first estimating / from 
z using translation invariant wavelet denoising, and then thresholding the resulting estimate / at level 
7; that is, S w — {i : > 7}. In these experiments we perform wavelet denoising using Daubechies-4 
wavelets and soft thresholding, where the threshold is chosen to minimize the excess risk. 



(d) Total-variation (TV) based plug-in method, where the estimate Stv is obtained according to Stv 
I ; : f\ TV ^ > 7|. The estimate f^ TV ' > of the input image / is obtained from y by solving 



f{TV) 



arg mm 



y-Af 



TV 



where / is the total- variation norm of /, and r is a user-defined parameter that balances the 

TV 

log-likelihood term and the regularization term. Algorithms such as the two-step iterative shrinkage and 
thresholding (TwIST) method provide a way to efficiently solve for the above optimization problem [S] . 
In our experiments, r is chosen to minimize the excess risk. 

In these simulation experiments, we compute the excess risk clairvoyantly based on the knowledge of /. We 
obtain the estimate S using our projective level set estimator according to Sn — argmin SgSM R(S)+Tpen(S) 

with a scaling factor t, which is chosen to minimize £^<Sjv, S^j . In these experiments, we use M = N. 

We evaluate the performance of all the competing algorithms discussed above, with and without projected 
mean subtraction discussed in Sec. [4] The number of observations used in these experiments is K = N/2 = 



17 



shows the 



3(c) shows 



8192. Fig. 3(a) shows the proxy observations obtained without mean subtraction. Fig. 3(b) 
level set estimate obtained by simply thresholding the proxy observations at level 7 and Fig. 
the estimate obtained by performing the risk-optimal thresholding method. These results demonstrate that 
thresholding noisy, proxy observations results in several false positives and misses. Though the wavelet- 
based plug-in method yields better results in comparison, as shown in Fig. |3(d)| the estimate is still severely 
oversmoothed and noisy. The estimate obtained using our projective level set estimator is shown in Fig 



This approach yields lower excess risk compared to the other three approaches discussed above, preserves 
some of the fine details, but still performs some oversmoothing. Fig. |3(f)| shows the results obtained using 
the TV-based plug-in method. This method yields the best results compared to the other approaches and 



yields the smallest excess risk, at the expense of first estimating the signal. Fig. 3(g) plots excess risk as 
a function of the number of measurements K < N = 16384 for all competing methods. These plots are 
obtained by averaging the results obtained over 200 different noise and projection matrix realizations. 

Figs. |4^a) through [4^g) show the improvements in results obtained because of the projected mean sub- 
traction. The improvements stem from the fact that the proxy measurements are less "noisy" after the 
projected mean subtraction. This subtraction operation lowers the excess risk of the estimates obtained 
using every method discussed above, except for the TV-based plug-in method, which performs very well in 
practice irrespective of mean subtraction. TV-based reconstruction is in general implemented using iterative 
algorithms where convergence is achieved if the mean squared error between estimates obtained in succes- 
sive iterations does not change beyond a user-specified tolerance value. The TwIST algorithm used in our 
simulation study uses the proxy measurements to initialize the iterative process and stops iterating when 
convergence is achieved. As a result, only the number of iterations to achieve a specified convergence will 
change depending on the quality of the proxy observations and not the final estimate. This explains why 
the TV-based results are insensitive to projected mean subtraction. 

The TV-based method seems to outperform our projective level set estimator since we evaluate the 
performance of these methods based solely on the excess risk and not on the computational resources required 
to achieve that excess risk. In that sense, this comparison is somewhat unfair. A more meaningful comparison 
would be to either evaluate the excess risk obtained within some unit time, or compare the time taken by 
different approaches to achieve a desired excess risk as the problem size N changes. To make the comparison 
fair, we ran our projective level set estimator for different problem sizes, used K ~ N/3 observations to 
get our estimates, recorded the excess risk obtained in each case, and ran the TV-based plug-in method to 
achieve the same excess risk in each case. In other words, instead of using the conventional convergence 
strategy in TV-based reconstruction algorithm, we stop iterating if the excess risk is less than or equal to 
the one obtained using our method. We compare the computational time required for both these methods 
as a function of problem size. Figs. [§a) and|J;b) show a 512 x 512 image and its corresponding level set, 
respectively. Note that the image used in above experiments is a cropped version of the image in Fig.[5|a). 
We cropped this image in order to get images of different sizes. In particular, we used images of size ixi where 
t = 76, 96, 116, . . . , 376. Fig. [5](c) shows the time-gap between these two methods to achieve similar excess 
risks, as a function of the number of pixels in the input image. These plots show that the computational 
time taken by TV-based plug-in method dramatically increases with problem size, where as the computation 
time required by our projective level set estimator increases much more gracefully with problem size. 

In conclusion, the experimental results indicate that estimating the underlying signal using TV 
regularization-based plug-in methods yields more accurate level set estimates compared to the ones ob- 
tained using our projective level set estimator. However, the real strengths of our method are two-fold. 
First, we can reliably perform real-time level set estimation compared to plug-in methods as shown by the 
time-gap versus problem size plot in Fig. [5|c) . Second, we can use our level set estimate to discard regions 
where the levels of interest are not present and design adaptive measurement schemes to hone-in on the 
regions of interest. Such an adaptive measurement scheme is especially helpful in very high-dimensional 
settings where the cost of collecting measurements and performing reconstruction tends to be extremely 
high. 

8 Conclusion 

This work proposes a theoretically sound and computationally efficient tree-based approach for extracting 
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(e) Estimate obtained us- (f ) Estimate obtained us- (g) Plot of excess risk as a function 
ing the projective level set ing the TV-based plug-in of K < N = 16384 without per- 
cstimator; £jy = 3.593 estimator; ejv = 0.5596 forming the projected mean sub- 
traction. 



Figure 3: Snapshots of the simulation results obtained (without performing the projected mean subtraction) 
from observations of the form in (111). 




(a) Proxy observations (b) Estimate obtained (c) Estimate obtained (d) Estimate obtained 

after projected mean sub- using the thresholding using the risk-optimal using the wavelet-based 

traction method; ejv = 8.562 thresholding method; method; ejv = 2.393 

ejv = 8.231 




(e) Estimate obtained us- (f ) Estimate obtained us- (g) Plot of excess risk as a function 
ing the projective level set ing the TV-based plug-in of K < N = 16384 after perform- 
estimator; £jy = 1-924 estimator; ejv = 0.5593 ing mean subtraction. 



Figure 4: Snapshots of the simulation results obtained (after performing projected mean subtraction) from 
observations of the form in (fll). 
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(a) (b) (c) 

Figure 5: Comparison of the computation times of TV-based plug- in method and our projective level set 
estimator, (a) True signal / e M. 512 * such that f l € [0, 239], (b) level set S* N = {i : fa > 125}, and (c) a plot 
of computation time as a function of problem size to approximately achieve the same excess risk. Images 
of different sizes (76 x 76 to 376 x 376) cropped from (a) were used in this experiment. The plots in (c) 
indicate that the time it takes for the TV-based plug-in method to achieve the same excess risk obtained by 
the projective level set estimator increases dramatically with problem size. 

level sets of a function from projection measurements without reconstructing the underlying function. The 
simulation results presented in Sec. [7] suggest that the proposed method may facilitate fast and accurate 
level set estimates from tomographic projections in medical imaging, Fourier projections in intcrferometry, 
or coded projections in compressive optical systems. One of the key advantages of our approach is that many 
of the operations on the proxy data are easily parallelizable. For instance, in problems where the domain 
of the signal of interest is very large, we can compute the proxy observations, partition the proxy data into 
different patches, run our estimation algorithm on each patch separately and merge the results to identify 
the regions that correspond to the level set. In applications such as medical imaging, the time saved by 
collecting fewer projection measurements and parallelization can be significant and crucial. 

Empirically, the accuracy of the projective level set estimate is comparable to that of a similar scheme 
based on wavelet thresholding or an iterative method with TV regularization. Currently, however, there is 
no theoretical support for these alternatives. Recent work studying the performance of so-called "analysis 
regularization" [T5] may lead to an improved understanding of theoretical performance bounds for the 
TV approach, but as we show here this iterative solution requires significantly more computational resources. 
Our approach is much more similar in spirit to the wavelet-based approach, and the theoretical techniques 
employed in our analysis may lead to an improved understanding of this and other fast, non-iterative ap- 
proaches. Furthermore, adaptive sampling schemes such as the one discussed in [17] suggest a potential 
extension of our method. Specifically, [T7] proposes collecting noisy measurements of a sparse signal, esti- 
mating its support and collecting more measurements based on the estimated support to adaptively focus 
the computational resources on regions of interest. The underlying assumption in such "distilled sensing" 
[18] schemes is sparsity. Since our level set estimation method offers a way to estimate the level set of a 
function without requiring sparsity, we expect it to facilitate the development of new adaptive sampling 
routines that perform better than the ones proposed in earlier works. 

References 

[1] Special Issue on Compressive Sampling, IEEE Signal Processing Magazine, 25 (2008). 

[2] I. Ayed, A. Mitiche, AND Z. Belhadj, Multiregion level-set partitioning of synthetic aperture radar 
images, IEEE Transactions on Pattern Analysis and Machine Intelligence, (2005), pp. 793-800. 

[3] W. U. Bajwa, R. Calderbank, and S. JAFARPOUR, Why Gabor frames? Two fundamental measures 
of coherence and their role in model selection, J. Commun. Netw., (2010), pp. 289-307. 

[4] W. U. Bajwa, R. Calderbank, and D. G. Mixon, Two are better than one: Fundamental param- 
eters of frame coherence, Appl. Comput. Harmon. Anal., 33 (2012), pp. 58-78. 



20 



[5] J. BiOUCAS-DiAS and M. Figueiredo, A New TwIST: Two-Step Iterative Shrinkage/Thresholding 
Algorithms for Image Restoration, IEEE Transactions on Image Processing, 16 (2007), pp. 2992 -3004. 

[6] T. Blumensath AND M. Davies, Iterative hard thresholding for compressed sensing, Applied and 
Computational Harmonic Analysis, 27 (2009), pp. 265-274. 

[7] E. Candes, J. Romberg, and T. Tao, Robust uncertainty principles: Exact signal reconstruction 
from highly incomplete frequency information, IEEE Transactions on Information Theory, 52 (2006), 
pp. 489 - 509. 

[8] E. Candes and T. Tao, Near optimal signal recovery from random projections: Universal encoding 
strategies, IEEE Transactions on Information Theory 52 (2006), pp. 5406-5425. 

[9] S. Chen and D. Donoho, Basis pursuit, in Signals, Systems and Computers, 1994. 1994 Conference 
Record of the Twenty-Eighth Asilomar Conference on, vol. 1, Oct-2 Nov 1994, pp. 41-44 vol.1. 

[10] A. Cuevas, W. Gonzalez-Manteiga, and A. Rodri'guez-Casal, Plug-in estimation of general 
level sets, Australian & New Zealand Journal of Statistics, 48 (2006), pp. 7-19. 

[11] D. Donoho, Compressed sensing, IEEE Transactions on Information Theory, 52 (2006), pp. 1289-1306. 

[12] D. Donoho and I. Johnstone, Ideal spatial adaptation by wavelet shrinkage, Biometrika, 81 (1994), 
pp. 425-455. 

[13] A. Fletcher, S. Rangan, and V. Goyal, Necessary and sufficient conditions for sparsity pattern 
recovery, IEEE Transactions on Information Theory, (2009), pp. 5758-5772. 

[14] C. Genovese, J. Jin, and L. Wasserman, Revisiting marginal regression, Arxiv preprint 
arXiv:0911.4080, (2009). 

[15] R. Giryes and M. Elad, RIP-based near-oracle performance guarantees for SP, CoSaMP, and IHT, 
IEEE Trans. Signal Processing, 60 (2012), pp. 1465-1468. 

[16] Z. Harmany, R. Willett, A. Singh, and R. Nowak, Controlling the error in FMRI: Hypothesis 
testing or set estimation?, in 5th IEEE International Symposium on Biomedical Imaging: From Nano 
to Macro, IEEE, 2008, pp. 552-555. 

[17] J. Haupt, R. Baraniuk, R. Castro, and R. Nowak, Compressive distilled sensing: Sparse recovery 
using adaptivity in compressive measurements, in Forty-Third Asilomar Conference on Signals, Systems 
and Computers, IEEE, 2009, pp. 1551-1555. 

[18] J. Haupt, R. Castro, and R. Nowak, Distilled sensing: Selective sampling for sparse signal recovery, 
in Proc. 12th International Conference on Artificial Intelligence and Statistics (AISTATS), Citcscer, 
2009, pp. 216-223. 

[19] G. Herman, Image Reconstruction from Projections, The Fundamentals of Computerized Tomography, 
New York Academic Press, 1980. 

[20] W. Jang and M. Hendry, Cluster analysis of massive datasets in astronomy, Statistics and Comput- 
ing, 17 (2007), pp. 253-262. 

[21] K. Krishnamurthy, W. Bajwa, R. Willett, and R. Calderbank, Fast level set estimation from 
projection measurements, in Statistical Signal Processing Workshop (SSP), 2011 IEEE, IEEE, 2011, 
pp. 585-588. 

[22] A. Kyrieleis, V. Titarenko, M. Ibison, T. Connolley, and P. Withers, Region- of -interest 
tomography using filtered backprojection: Assessing the practical limits, Journal of Microscopy, (2010). 

[23] B. Laurent and P. Massart, Adaptive estimation of a quadratic functional by model selection, 
Annals of Statistics, (2000), pp. 1302-1338. 



21 



[24] C. Li, C. Xu, K. Konwar, and M. Fox, Fast distance preserving level set evolution for medical 
image segmentation, in Proc. 9th Intl. Conf. Control, Automation, Robotics and Vision, IEEE, 2006, 
pp. 1-7. 

[25] J. Ma, Iterative region of interest reconstruction in emission tomography, in IEEE International Sym- 
posium on Biomedical Imaging: From Nano to Macro,, April 2010, pp. 604 -607. 

[26] C. MAASS, M. Knaup, and M. Kachelriess, New approaches to region of interest computed tomog- 
raphy, Medical Physics, 38 (2011), p. 2868. 

[27] R. Marques, F. De Medeiros, and D. Ushizima, Target detection in SAR images based on a level 
set approach, Systems, Man, and Cybernetics, Part C: Applications and Reviews, IEEE Transactions 
on, 39 (2009), pp. 214-222. 

[28] D. Mason and W. Polonik, Asymptotic normality of plug-in level set estimates, The Annals of 
Applied Probability, 19 (2009), pp. 1108-1142. 

[29] C. McDiarmid, On the method of bounded differences, Surveys in combinatorics, 141 (1989), pp. 148- 
188. 

[30] A. Moreno, Total error in a plug-in estimator of level sets, Statistics and Econometrics Working 
Papers, (2003). 

[31] R. Puetter, T. Gosnell, and A. Yahil, Digital image reconstruction: Deblurring and denoising, 
Annual Reviews on Astronomy and Astrophysics, 43 (2005), pp. 139-194. 

[32] P. Rigollet and R. Vert, Fast rates for plug-in estimators of density level sets, Bernoulli, 15 (2009), 
pp. 1154-1178. 

[33] P. Rosen, S. Hensley, I. Joughin, F. Li, S. Madsen, E. Rodriguez, and R. Goldstein, 
Synthetic aperture radar interferometry, Proceedings of the IEEE, 88 (2000), pp. 333-382. 

[34] C. Scott and M. Davenport, Regression level set estimation via cost-sensitive classification, IEEE 
Transactions on Signal Processing, 55 (2007), pp. 2752-2757. 

[35] C. Scott and R. Nowak, Learning minimum volume sets, Journal of Machine Learning Research, 7 
(2006), pp. 665-704. 

[36] C. Scott and R. Nowak, Minimax- optimal classification with dyadic decision trees, IEEE Transac- 
tions on Information Theory, 52 (2006), pp. 1335-1353. 

[37] A. Singh, C. Scott, and R. Nowak, Adaptive Hausdorff estimation of density level sets, The Annals 
of Statistics, 37 (2009), pp. 2760-2782. 

[38] D. Takhar, J. Laska, M. Wakin, M. Duarte, D. Baron, S. Sarvotham, K. Kelly, and 
R. BARANIUK, A new compressive imaging camera architecture using optical- domain compression, in 
SPIE, vol. 6065, 2006. 

[39] R. Tibshirani, Regression shrinkage and selection via the lasso, Journal of the Royal Statistical Society. 
Scries B (Methodological), (1996), pp. 267-288. 

[40] J. Tropp and A. Gilbert, Signal recovery from random measurements via orthogonal matching pur- 
suit, Information Theory, IEEE Transactions on, 53 (2007), pp. 4655 -4666. 

[41] A. Tsybakov, On nonparametric estimation of density level sets, Annals of Statistics, 25 (1997), 
pp. 948-969. 

[42] S. Vaiter, G. Peyre, C. Dossal, and J. Fadili, Robust sparse analysis regularization, Arxiv 
preprint arXiv:1109.6222, (2011). 

[43] V. Vapnik, The nature of statistical learning theory, Springer Verlag, 2000. 



22 



[44] R. Vershynin, Norm of a random matrix. Lecture notes on Non- asymptotic theory of random matrices, 
2006-2007. 

[45] R. Willett and R. Nowak, Minimax optimal level set estimation, IEEE Transactions on Image 
Processing, (2007), pp. 2965-2979. 

[46] Y. Yang, Minimax nonparametric classification-Part I: Rates of convergence, IEEE. Transactions on 
Information Theory, 45 (1979), pp. 2271-2284. 



23 



